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Abstract 

Genetic variants tliat predispose adults and the elderly to high blood pressure are largely unknown. We used a 
bivariate linear mixed model approach to jointly test the associations of common single-nucleotide polymorphisms 
with systolic and diastolic blood pressure using data from a genome-wide association study consisting of genetic 
variants from chromosomes 3 and 9 and longitudinal measured phenotypes and environment variables from 
unrelated individuals of Mexican American ethnicity provided by the Genetic Analysis Workshop 18. Despite the 
small sample size of a maximum of 131 unrelated subjects, a few single-nucleotide polymorphisms appeared 
significant at the genome-wide level. Simulated data, which was also provided by Genetic Analysis Workshop 18 
organizers, showed higher power of the bivariate approach over univariate analysis to detect the association of a 
selected single-nucleotide polymorphism with modest effect. This suggests that the bivariate approach to 
longitudinal data of jointly measured and correlated phenotypes can be a useful strategy to identify candidate 
single-nucleotide polymorphisms that deserve further investigation. 



Background 

High blood pressure is a common disorder in adults and 
the elderly and is associated with increased risk of cardi- 
ovascular diseases and many other morbidities [1]. It is a 
complex trait that can be influenced by certain genetic 
makeups, environmental factors, or their interactions. 
There have been some genetic association studies to 
identify the effect of genetic component on blood pres- 
sure in certain ethnic population. The San Antonio 
Family Studies (SAFS) is a family-based longitudinal 
study designed to identify genes associated with high 
blood pressure in Mexican American population. For 
Genetic Analysis Workshop 18 (GAW18), the data from 
SAFS was provided for odd-numbered chromosomes. 

In multivariate longitudinal data, multiple response 
variables are jointly measured over time from the same 
individuals. Other environmental variables can also 
change over time and could have been recorded. To 
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understand the relationship between independent and 
multivariate dependent response variables, we need to 
take into account the correlation between multivariate 
responses measured over time. In GAW18 data, the phe- 
notypic data, namely systolic blood pressure (SBP) and 
diastolic blood pressure (DBP), as well as some environ- 
mental data, were measured over time and these pheno- 
typic data from the same individuals are likely to be 
correlated as they might be regulated by the same genes 
or common environments. Therefore, we used a bivariate 
linear mixed-effect model approach to test the associa- 
tion of genetic variants with bivariate phenotypic values 
adjusting for environmental factors. 

Methods 

Participants, genotyping data, and quality control 

Details about sample recruitment can be found in Hunt 
et al [2]. For GAW18, data from 959 participants of 
Mexican American ethnicity from 20 large pedigrees, 
who were followed up for up to 4 times from 1991 to 
2011, were provided. However, for our analysis we 
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considered only 157 unrelated members from this 
cohort, of whom only 142 individuals had both genoty- 
pic and phenotypic data. 

We analyzed genome-wide association studies 
(GWAS) data from chromosomes 3 and 9. There were 
65,519 single-nucleotide polymorphisms (SNPs) on 
chromosome 3 and 42,177 SNPs on chromosome 9. We 
restricted our analysis to common SNPs (minor allele 
frequency >0.05) with genotyping call rates >0.95 and 
Hardy- Weinberg equilibrium p value >10"^ and subjects 
with genotyping call rates >0.95. We applied these cri- 
teria to those 142 individuals using PLINK. 



R,(s, t) = c/''""' at time t and covariance matrix R,(s, t] = c/''"'^ 
at times t and s, 0 < s < t. Two traits are independent 
if (Tui^ui^ = 0. Here, B is a 2 x 2 real matrix chosen such 
that the eigenvalues of B have negative real parts and 

matrices C and (CB + B'C) are positive semidefinite 

symmetric [3]. We have £(Yj)=X,/S and 
var{Yi) = ZiGiZ'f + i?; + Xj under independence assump- 
tion of Yi, Wi and £,'; thus 7,- - N {XiP, ZidZ- + J?,- + !:,■). 
Solution for = (y8\ p^y can be obtain by maximum like- 
lihood or restricted maximum likelihood (REML) 
approach using multivariate normal likelihood of Yj. 



Statistical method 

We applied a bivariate linear mixed model framework 
(more details can be found in Refs. [3,4]) in order to 
test for association between individual common genetic 
variant with SBP and DBP jointly. For trait k{k = 1,2), 
suppose Y^ is a («; x 1) vector of the trait values for Hi 
times of measurements for the subject i(i = 1, 2, . . . ,N); 
then, the univariate mixed-effect model with p indepen- 
dent variables with ej{cl <p) of them having random 
effects, can be expressed as [3,4] 



(1) 



where xj* is a (n, x p) design matrix that results in the 
systematic variation in the fe* trait with p'' as the corre- 
sponding (p X 1) vector of fixed-effect; Z^ is a («< x q) 
design matrix, usually a subset of {cf < p) that charac- 
terize the random variation in the trait with yf' ~ N(0, G^) 
as the corresponding [cj x l) vector of random effect; 
wf ~ N{0,R^) is a («j x 1) vector of the stochastic pro- 
cesses (within subject errors over repeated times) with 
realization w-'(t) at time t with variance r'I (t) = ct^, and 
covariance R'I (s,t) = cov{u/l (s) ,u/l{t)) = a^,,e^''^^''^ at 
times s and t, 0 < s < t; and e'- ~ N(0, cr^Jn.) is a («, x 1) 
vector of random errors, where /n, is an identity matrix. If 
the 2 traits and are correlated, then bivariate linear 
mixed model can be formulated as 



Y} 



Xj 0 
0 X? 



'Zj 0 

0 z? 



(2) 



That is, 

Ki ~ N (0, G) ; W, 

^12 

G ■ ■ ■ 



Here, 



Yi = XiP + ZiYi + Wi + Si, where, 
NiO,Ri};ei-^ N (0,1,1) ■ 

(='=j:);I, = E»,.,;E = (f 

is the Kronecker product. The W, is the bivariate sto- 
chastic processes that not only captures the correlation 
of measurements within the same subject at multiple 
times, but also the correlation between 2 traits at the 
same time for the subject, and has the variance matrix 



Data analysis 

After the quality control filtering, we had only 133 partici- 
pants available for bivariate analysis, who had genotype 
data and at least 1 measurement of phenotypic data. 
There were 52,862 SNPs from chromosome 3 and 34,475 
SNPs from chromosome 9 that passed the filtering criteria. 

For the bivariate linear mixed-effect model fitting for 
each SNP, genotype (0, 1, or 2 for the number of copies of 
minor allele) of a SNP was the independent variable of 
interest. Besides, we a priori selected to include 3 covari- 
ates, namely, measurement time in years at an examina- 
tion since enrollment (which is 0 for the first year of 
enrollment), baseline age (at enrollment), and repeatedly 
measured antihypertensive medication use in the model. 
We also included sex and repeatedly measured smoking 
status in the model because keeping them, each separately 
or together, resulted in a better fit (smaller Akaike infor- 
mation criteria [AIC]) of the bivariate model for the 20 
SNPs from chromosome 9 selected for model fit assess- 
ment (modeling techniques description given in a succes- 
sive paragraph). We considered autoregressive order-1 
[AR{1)) assumption used for repeated measured analysis 
and unstructured [UN] variance components used for the 
random-effect analysis to identify the appropriate covar- 
iance (or correlation) structure for between and within the 
2 phenotypic measurements over time [4,5]. The AR{1) 
assumption did not lead to the noticeably improved model 
fit but did involve some unrealistic assumptions in bivari- 
ate modeling, such as measurement at equal interval of 
time for both phenotype at a time and for a phenotype 
over time. Therefore, we assumed unstructured variance 
components assumption that allows correlations between 
any 2 measurements for the same phenotype and between 
2 phenotypes at a time to vary across subjects. 

There were missing follow-up data on blood pressure 
and other repeatedly measured covariates, where data 
from 97 subjects were missing in the fourth enrollment 
period. The available data for the fourth period were not 
used because discarding them resulted in much better fit 
for each of those 20 SNPs. Next, 2 subjects had missing 
data on medication use and smoking in all 3 examinations. 
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Thus, the effective sample size was 131 for real data analy- 
sis. No attempt was made to use imputation. 

A binary variable "BPTYPE" (blood pressure type) was 
defined as BPTYPE = 1 for SBP and = 0 for DBP for a sub- 
ject at a measurement time. Finally, equation (2) was fitted 
for each SNP, where the repeatedly measured bivariate 
blood pressure (a maximum of six possible measurements 
for two phenotypes at three examinations for a subject) 
was regressed against genotype and the covariates specified 
above (each regressor was multiplied by BPTYPE in order 
to perform bivariate analysis) using MIXED procedure in 
SAS (see Refs. [4,5] for details of modeling technique and 
SAS codes). In the analysis using REML, we considered 
BPTYPE having group effect, patients' ID having subject 
effects, and measurement time having random effect; that 
is, time effects on blood pressure varies across individuals. 

For the genotype effects fie = iPi = Pg'^'', Pi = P™^)' 
on SBP and DBP, we estimated = {^1,^2)' and the 

corresponding covariance matrix S = ( ^ 2^ ). ^1 and S2 

\Sl2 S2 ) 

being standard errors of ygj and ygj' respectively, and 5i2 
their covariance, per 1 copy increase in minor allele of 
each SNP, using the bivariate approach. For each SNP, 
we tested the null hypothesis Hq : /S = 0 vs. alternative 
Hi : ,S 7^ 0 (ie Hi : at leastl, ^j, 7^ 0; k = 1, 2; that is, the 
genotype was associated with at least 1 phenotype) 
using F-test with {&\ = 1,'&2 = 360), degrees of freedom 
[6], assuming multivariate normality of Pq. We also 
tested the hypothesis with x| test statistic assuming 
large sample approximation [6]. Because the data arose 
from GWAS, we used a genome-wide significance 
threshold, a = 7.2 x 10 [7], to adjust for multiple test- 
ing problems, which enables us to see if any SNPs from 
chromosome 3 or 9 achieve this threshold in bivariate 
analysis. 

Simulation 

We assessed the statistical power of the bivariate linear 
mixed effect model using 200 simulated longitudinal 
data sets provided by GAW18 organizers. However, we 
considered the data from only 142 unrelated subjects 
who had genotypic data and phenotype and covariate 
information from all 3 examinations. We chose to assess 
power to detect the association of a common SNP, 
rs6442089 (from MAP4 gene). The SNP had the effect 
sizes, Pi = -1.4951 (variation explained = 0.0117%) and 
P2 = -2.3810 (variation explained = 0.0143%) in simu- 
lated SBP and DBP, respectively, per copy increase in 
minor allele. We used data from 141 subjects as geno- 
type data was missing for the SNP in 1 subject. We 
employed the same regression analysis model and mod- 
eling technique, and assessed the same hypothesis using 
F (??i =1,§2 = 552) and /| test statistics as in real data 



analysis above. Power of univariate linear mixed model 
analysis to detect the effect of the same SNP separately 
on SBP and DBP . (Hq : h = OvsHi : Pk f^O,k = 1,2) 
was also assessed using the same regression model 
and assumption as in bivariate case using F 
= 1, ??2 = 137) and Xi test statistics. We assessed the 
power at a = 0.05 as there was no issue of multiple test- 
ing; however, we also used a = 7.2 x 10~* to be consis- 
tent with real data analysis. 

Results 

Real data analysis 

In our sample data, the mean age (standard deviation) at 
enrollment was 53.7 (16.0) years, where subjects were 
20.3 to 94.2 years old when enrolled. In a graphical 
inspection, the SBP and DBP data at each and all 3 
examinations looked approximately normal. 

Table 1 displays the results of the bivariate linear 
mixed model analysis using F-test for the first 15 most 
significant SNPs in each of chromosomes 3 and 9. The 
Manhattan plot and quantile-quantile (Q-Q) plot of the 
joint association p values for all SNPs are shown in 
Figures 1 and 2, respectively. The joint association 
p values using test statistic were in general slightly 
smaller but very similar to that from F statistic (the dif- 
ference was very small for the most significant SNPs, for 
instance, p value by /I for rs9632874 was smaller by 
1.98E-10 [results not shown]). Three SNPs, rsl2634258, 
rs7647249, rs4S33619 from intergenic region on 
chromosome 3, and 4 SNPs rs9632874, rsl233S766, 
rsl0122040 from gene TTC39B and rs7864652 from 
gene BNC2 on chromosome 9, appeared to be signi- 
ficant at genome-wide level. 

Simulation results 

The bivariate linear mixed model analysis had 76.5% 
power to detect the effect of rs6442089 jointly on SBP 
and DBP; whereas the separate univariate linear mixed 
model analyses had only 30.5% and 45.0% power to 
detect of effects of the same SNP on SBP and DBP, 
respectively at a = 0.05, using 141 unrelated subjects. At 
a = 7.2 X 10"* the association of SNP was detected in 
5 (2.5%) of simulated data sets in the bivariate analysis, 
but the corresponding univariate analysis did not detect 
the association with either of the phenotype in any 
simulated data set. The univariate analysis produced the 
smallest p value of 1.6 x 10"^ for SBP and that for DBP 
was slightly bigger (Table 2), whereas the bivariate ana- 
lysis resulted in smaller p values in 14.5% of the simu- 
lated data sets. 

Discussion 

In bivariate linear mixed model analysis, we observed asso- 
ciations of a few SNPs from intergenic regions and 
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Table ^ Top 15 most significant SNPs in the bivariate linear mixed model analysis. 
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BP, Base pair position; MAF, minor allele frequency; MiA, minor allele; P, joint association p value; SNP, single-nucleotide polymorphism. 



TTC39B gene with high blood pressure despite the small 
sample size. The bivariate mixed model framework to the 
GAW18 simulated data suggested that the bivariate analy- 
sis is more powerful than univariate approach to analyze 
longitudinal data when phenotypes are correlated. An ear- 
lier simulation study [8] also found that bivariate approach 
is in general more powerful than the univariate analysis 
when quantitative traits are correlated. High blood pres- 
sure is believed to be influenced by hundreds of genes 
with generally very small to modest effects. So we need 
statistical method with improved power to detect such 
associations and bivariate method to longitudinal data can 
be a useful strategy to identify the list of SNPs that can be 



followed up for replication or validation of their associa- 
tions with the phenotypes of interest. 

However, we wish to underscore the limitation that 
our simulation study was not extensive; we just com- 
pared the power of the bivariate approach with that of 
the univariate approach for only 1 SNP. We did not 
assess the type I error rate of the bivariate method, 
which could be inflated, especially for a small data set. 
In fact, the Q-Q plot (see Figure 2) of the joint associa- 
tion p values (see Figure 2) strongly suggests such infla- 
tion of the error rate, although the deviation from null 
distribution could also be an indication of the presence 
of population substructure or admixture. Also, one 
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Figure 1 Manhattan plot of joint association p values for variants on chromosomes 3 and 9. 
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Table 2 Top 3 p values and corresponding simulation data sets from bivariate and separate univariate mixed model 
analyses, and p values from other methods in the same simulation data sets. 



Analysis type 


p Value rank 


Simulation data set # 




p Value 
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needs to be cautious in interpreting our results from the 
real data analysis as we have a number of limitations in 
this study, including small sample size, missing observa- 
tions, uncertainty about the underlying genetic model, 
selection of an appropriate correlation structure, random 
effect assumptions, and choice of test statistic. For 
instance, although we chose a covariance (correlation) 
structure between AR{1) and UN, AR{1) had a similar or 
somewhat smaller AIC. However, it assumes that mea- 
surements were made at an equal interval over time for 
each and all phenotypes [4,5]. But this was an unrealistic 
assumption in our data because the time interval between 
the first and second examination ranged from 1.4 to 7.6 
years. Next, it also assumes the same correlation between 
measurements of 2 phenotypes at a time and between any 
2 measurements of all the phenotypes for all subjects, 
which might not be true. Although it involves estimating 
more parameters, a UN assumption is more flexible, con- 
sequently, we employed it our bivariate analysis. A detail 
investigation of the properties of bivariate method in 



many realistic scenarios via extensive simulation is war- 
ranted before we draw a general conclusion about the use- 
fulness of the bivariate method for the correlated 
repeatedly measured phenotypes. 

Conclusion 

A bivariate approach to test associations of genetic 
variants with multiple phenotypes jointly measured over 
time from the same individuals could be a useful 
strategy to identify genetic variants that deserve further 
investigation as it can exploit the correlation structure 
between phenotypes at a time and the same phenotype 
over time. 
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